25 Hastings Ave Unit 1, Waltham, MA 02453 ● 617-682-6844 ● xinyulrsm@gmail.com
EDUCATION AND BACKGROUND
Xinyu (Max) Liu
I hold a Master degree in Electrical Engineering (2008) from the University of Pennsylvania. I have over ten-thousand-lines
of programming experience in Python and C, as well as several thousand-lines of experience in C#. I am an expert in data
modeling and analysis using Python, Mathematica and MATLAB. I am very familiar with algorithms of machine learning
and data mining as well as SQL database techniques. I also have working knowledge of R, Java, and JavaScript.
PROFESSIONAL EXPERIENCES
Data Analyst / Data Scientist (2011 - present)
Arc-Energy, Nashua NH
● Process data analysis. As key developer, I successfully developed a system for managing process data, including
monitoring, downloading, storing, and analyzing. Sources of process data include Siemens Scada and overseas HMIs. The
analysis tools used include C#, Python, Excel/VBA, and Mathematica. Database used is MySQL database. The system has
been running continuously and smoothly for four years and providing useful information and analysis daily for process
monitoring and improvement.
● Data analysis cluster. This project is based on Hadoop and Spark. All the run data and simulation data are saved in a
Hadoop cluster. Data processing such as feature calculation is performed by PySpark through accessing hdfs files. This
cluster stores more than 100 GB data with an increment of 2 GB every month. With Pig/Hive and Spark, I am able to
query and analyze any data within acceptable time period (One hour).
● Django applications. I developed a Django application for process recipe check. Any user with access to the internal
network is able to use a web browser to run the recipe check. At the back end, the application collects data from SharePoint,
Excel files and HMIs, then compares the recipes and eventually generates a checking report. This application is running in
production and identified and reported about two hundred errors, saving ~$500K for the organization.
Software Developer and Co-founder (2010 – present)
www.excel-chart.com
In many occasions Excel users have the need to zoom in an Excel chart to view more details but the functionality is not built
in Excel. I developed an Excel Add-In to provide this feature, and the Add-In. This Add-In has been highly welcomed by the
Excel user community.
After installation, a menu named Excel-Chart and the corresponding toolbar will be displayed in Excel. Then the user simply
use the mouse to select a region of interest in the chart and then click buttons on the toolbar to zoom in or out of the chart.
Data Analysis Engineer (2009 - 2011)
American Superconductor, Davens MA
● Texture data analysis. Led a team to develop texture analysis methods and built a database for it. Development tools
used were Python, Mathematica and VBA. This database provides clean data for further analysis for production
management and product improvement (Currently in active use at American Superconductor).
● Optical microstructure analysis. Developed a software for analyze the optical microstructure for grain size and
distribution. This software makes the procedure ten times faster than doing it manually.
Research Associate (2008 - 2009)
Ames Lab of U.S. DOE, Ames IA
· SEM, TEM image analysis. Developed customer tools using ImageJ (A Java image analysis tool/library from NIH) for
image analysis with a focus on grain size analysis from SEM and TEM images.
1
Xinyu (Max) Liu
PROJECTS: Data Mining and Software Development
I. Open source projects on GitHub (https://github.com/maxliu)
· Health care fraud detection using open source data (https://github.com/maxliu/health_care)
In this project, Part-D data and Payment data from cms.gov and NPI exclusion
data from hhs.gov are used to build a predictive model for healthcare fraud
detection. t-test is used to select drug on training set to extract features. The
Part_D and Payment data set is stored in a five-node Hadoop cluster. Pig scripts
is used for querying data and merge tables. The data processing pipeline
including scaling and modeling are implemented by SciKit-learn package. The
average AUC is 0.69 from a Random Forest Classifier. This model should be
able to provide useful clue for fraud detection.
· Scikit-Learn –Pybrain adapter (https://github.com/maxliu/sklearn-Rprop-adapter).
SciKit-Learn is widely used in data mining, especially its functions of Pipeline and parameter search. However, to
date, I could not find the Neuron Network algorithms I needed such as BP network for data mining work. Given
that Pybrain is an excellent library for neuron network algorithms, I developed a Scikit-Learn-Pybrain adapter. This
adapter wrapped Pybrain functions to Scikit-learn format so that pipeline and parameter search can be used for
Pybrain.
· Buffer plot for Vim (https://github.com/maxliu/vim_bufplot)
This is a Vim plug-in allowing people to visualize their data when editing a file in Vim. This
plug-in provides an easy-to-use tool to help with data viewing and understanding. Once the
columns are selected, the command of “:plot” will plot and display the data in a graph. The
code is written in Python and need package of “Matplotlib”.
II. Commercialized Projects
· MS Excel-Chart-Zoom (Software Developer and Co-founder, http://www.excel-chart.com/)
A much welcomed Excel Add-in provides an easy-to-use toolbar in Excel to zoom
in/out a region of any Excel chart by the mouse without any key typing. Once the AddIn
is installed, an additional tab called “Excel-chart” will show up. By selecting the
region of interest in a chart, user can zoom in along X-axis, Y-Axis individually or
combined. It also has the functionality of moving the plot to the direction of left, right,
up and down. This Add-In was developed in VBA and C#. It works for Excel 2007,
2010 and 2013, in both Windows 7 and Windows XP.
2